Refining the Structure of a Stochastic Context-Free Grammar

نویسندگان

  • Joseph Bockhorst
  • Mark Craven
چکیده

We present a machine learning algorithm for refining the structure of a stochastic context–free grammar (SCFG). This algorithm consists of a heuristic for identifying structural errors and an operator for fixing them. The heuristic identifies nonterminals in the model SCFG that appear to be performing the function of two or more nonterminals in the target SCFG, and the operator attempts to rectify this problem by introducing a new nonterminal. Structural refinement is important because most common SCFG learning methods set the probability parameters while leaving the structure of the grammar fixed. Thus, any structural errors introduced prior to training will persist. We present experiments that show our approach is able to significantly improve the accuracy of an SCFG designed to model an important class of RNA sequences called terminators.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

An Application of Stochastic Context Sensitive Grammar Induction to Transfer Learning

We generalize Solomonoff’s stochastic context-free grammar induction method to context-sensitive grammars, and apply it to transfer learning problem by means of an efficient update algorithm. The stochastic grammar serves as a guiding program distribution which improves future probabilistic induction approximations by learning about the training sequence of problems. Stochastic grammar is updat...

متن کامل

Stochastic k-Tree Grammar and Its Application in Biomolecular Structure Modeling

Stochastic context-free grammar (SCFG) has been successful in modeling biomolecular structures, typically RNA secondary structure, for statistical analysis and structure prediction. Context-free grammar rules specify parallel and nested co-occurren-ces of terminals, and thus are ideal for modeling nucleotide canonical base pairs that constitute the RNA secondary structure. Stochastic grammars h...

متن کامل

Stochastic Context-Free Grammars and RNA Secondary Structure Prediction

This thesis focus on the prediction of RNA secondary structure using stochastic context-free grammars (SCFG). The RNA secondary structure prediction problem consists of predicting a 2-dimensional structure from a 1-dimensional nucleotide sequence. The theory behind SCFG is explained and an overview of the research literature on various methods in the field of secondary structure prediction is g...

متن کامل

Daniel A . Woods CS 229 Final Project

1 Current methods model RNA sequence and secondary structure as stochastic context-free grammars, and then use a generative learning model to find the most likely parse (and, therefore, the most likely structure). As we learned in class, discriminative models generally enjoy higher performance than generative learning models. This implies that performance may increase if discriminative learning...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001